The global smartphone market has experienced multiple market leaders over the past few years. The only two major brands that remained over the years were Apple and Samsung. Those two brands are compared regularly and are competing over the market share. While Samsung was the market leader in the first three quarters in 2021. In September 2021, Apple released its new iPhone model with the generation 13 and gained market share, taking over the market leader position. In February 2022, Samsung also released the new model in its Galaxy series. Those two phones are the flagships in the current smartphone market.
Source: IDC. (2022). Global smartphone market share from 4th quarter 2009 to 4th quarter 2021 (by vendor). Statista. Statista Inc.. Accessed: April 14, 2022.
The Samsung Galaxy S series is the high-end model produced and sold by Samsung, a South Korean multinational electronics company. The S series has 13 generations. Together with the Galaxy Z and Galaxy Note (discontinued) series, the S series serves as flagship models (Wikipedia, n.d.). Samsung S series uses the open source operating system Android (Samsung).
In February 2022 the model S22 was released in three variants: S22, S22 Plus (also: S22+) and S22 Ultra (Wikipedia, n.d.). It comes in the colours that are shown in the picture below, namely: phantom white, burgundy, phantom black and green (Samsung). Also the S22 Ultra variant comes with an so-called “S Pen” which lets the user write on their phone and can be recharged within the phone (Samsung).
Samsung advertises the S22 series with the following features:
"The phone that makes everyday epic
Nightography camera
A battery that lasts the day and beyond
Our fastest chip ever"
(Source: Samsung)
The Apple iPhone series is the only smartphone series produced and sold by Apple Inc, an US-American multinational technology company (Wikipedia).
The generation 13 was released in September 2021 in three variants: iPhone 13, iPhone 13 mini, iPhone 13 Pro, iPhone 13 Pro Max and iPhone SE. The regular variant and mini can be bought in the colours: green, pink, blue, midnight, starlight, red. The variants Pro and Pro Max are available in: alpine green, silver, gold, graphite, sierra blue. The iPhone SE is available in: midnight, starlight, red (Apple).
Apple advertises the iPhone Pro with the following features:
"A dramatically more powerful camera system.
A display so responsive, every interaction feels new again.
The world’s fastest smartphone chip.
Exceptional durability.
And a huge leap in battery life." (Samsung)
| Samsung S22 | Apple iPhone 13 | |
|---|---|---|
| Starting Price | $799 | $799 |
| Screen size | 6.1 inches (2340 x 1080) | 6.1 inches (2532 x 1170) |
| Refresh rate | 48Hz-120Hz adaptive | 60Hz |
| CPU | Snapdragon 8 Gen 1 (US); Exynos 2200 (K) | A15 Bionic |
| RAM | 8GB | 4GB (based on teardowns) |
| Storage | 128GB, 256GB | 128GB, 256GB, 512GB |
| Rear cameras | 50MP wide (f/1.8); 12MP ultrawide (f/2.2); 10MP telephoto (f/2.4) with 3x optical zoom | 12MP main (f/1.6), 12MP ultrawide (f/2.4) |
| Front camera | 10MP (f/2.2) | 12MP (f/2.2) |
| Battery size | 3,700 mAh | 3,227 mAh (based on teardowns) |
| Battery life (Hrs:Mins) | 7:51 | 10:33 |
| Charging speeds | 25W wired, 15W wireless | 20W wired; 15W wireless |
| Size | 5.7 x 2.8 x 0.3 inches | 5.8 x 2.8 x 0.3 inches |
| Weight | 5.9 ounces | 6.14 ounces |
| Colors | Black, white, green, pink gold | Black, white, blue, pink, red, green |
Source: https://www.tomsguide.com/face-off/samsung-galaxy-s22-vs-iphone-13
Identification of the strengths and weaknesses of the competitor models which can help to identify a gap in a market and enhancing the own business strategy.
Identification of suggestions to customers about the products based on other customers’ opinions
We used data from Twitter API and Amazon Reviews to get information towards our research objective.
We used the following methods:
Word Frequencies,
Sentiment Analysis,
Word Correlations,
Clustering and
LDA Topic Modelling.
We decided to extract data from Twitter with the keywords “Samsung S22” and the hashtag “#samsungs22” from 31.03.2022 - 08.04.2022 as we do not have a premium license to get access to the full timeline of tweets on the TwitterAPI.
We excluded from our query the users ShopeeID, _arllee and all retweets for the keywords “Samsung S22” because there were multiple thousands of tweets about a competition to win a Samsung mobile phone which caused a lot of duplicated data.
As well, we identified nine users which were creating advertising spam and non-valuable tweets that we had to exclude: “whitestonedome”, “FromKorea5”, “dome_glass”, “Whitestone_DE”, “whitestone_UK”, “jp_whitestone”, “Whitestone__FR”, “WhitestoneJapan”, “WhitestoneEU”.
In total we could gather 3846 tweets.
Our initial data frame has 17 attributes. You can find the meaning of each of the attributes here on the Twitter Developer Platform.
In the following code we checked that the data.frame was correctly formatted:
'data.frame': 3846 obs. of 17 variables:
$ X : int 1 2 3 4 5 6 7 8 9 10 ...
$ text : chr "Can anyone show/tell me how a stock trading platform like #thinkorswim looks on a #samsung #zfold3 \n\n#samsung"| __truncated__ "Read our review of Samsung Galaxy S22; the smart phone of the season - https://t.co/zYpSuPocyZ\n.\n.\n#whatsnew"| __truncated__ "@JeromePolin @samsungID find u bang jerrr #findjerome #samsungs22 #mynewrules https://t.co/FFpojQGj5I" "Ran out of patience with my phone dropping out of network every other hour. Time for an upgrade\n#SamsungS22… h"| __truncated__ ...
$ favorited : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ favoriteCount: int 0 0 0 1 2 0 1 0 0 0 ...
$ replyToSN : chr NA NA "JeromePolin" NA ...
$ created : chr "2022-04-08 04:54:57" "2022-04-07 12:45:20" "2022-04-07 12:07:09" "2022-04-07 08:03:02" ...
$ truncated : logi TRUE TRUE FALSE TRUE TRUE TRUE ...
$ replyToSID : num NA NA 1.51e+18 NA 1.51e+18 ...
$ id : num 1.51e+18 1.51e+18 1.51e+18 1.51e+18 1.51e+18 ...
$ replyToUID : num NA NA 1.04e+18 NA 1.96e+08 ...
$ statusSource : chr "<a href=\"https://mobile.twitter.com\" rel=\"nofollow\">Twitter Web App</a>" "<a href=\"https://mobile.twitter.com\" rel=\"nofollow\">Twitter Web App</a>" "<a href=\"http://twitter.com/download/android\" rel=\"nofollow\">Twitter for Android</a>" "<a href=\"http://twitter.com/download/android\" rel=\"nofollow\">Twitter for Android</a>" ...
$ screenName : chr "c0vert" "WhatsNewDawg" "person11666" "mini_muzz" ...
$ retweetCount : int 0 0 0 0 0 0 0 0 0 0 ...
$ isRetweet : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ retweeted : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ longitude : num NA NA NA NA NA NA NA NA NA NA ...
$ latitude : num NA NA NA NA NA NA NA NA NA NA ...
We extracted data for the search terms “iPhone 13” excluding retweets and the hashtag “#iphone13” from the TwitterAPI. Similarly, to the Samsung S22 we had to exclude certain users which were advertising (“whitestonedome”, “FromKorea5”, “domeglassapple”) and competitions to win an iPhone when retweeting or copy-pasting a specific text (“Join the event to win an iPhone 13!”). Also, we had a high number of tweets which were randomly posting brand names, e.g. “rolex iphone” in one tweet.
In total we could gather 8369 tweets.
Our initial data frame has 17 attributes. You can find the meaning of each of the attributes here on the Twitter Developer Platform.
In the following code we checked that the data.frame was correctly formatted:
'data.frame': 8369 obs. of 18 variables:
$ X.1 : int 1 2 3 4 5 6 7 8 9 10 ...
$ X : int 1 2 3 4 5 6 7 8 9 10 ...
$ text : chr "Apple iPhone 13 Pro Max, Graphite, 256GB -... - https://t.co/WN9QahDmcf #Deals #iPhone_deals #ukdealsonline ht"| __truncated__ "@person_pikin1 @Olamideofficial @kusssman Simple!!\n\n13+11=24\n\nIt’s iPhone 24 pro max" "@GetLocoNow iam an eSports player and I have Android devices it's lagg too much. I hope I'll win iPhone 13 for "| __truncated__ "iphone 13 blue or starlight or pink? which one is better huh?" ...
$ favorited : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ favoriteCount: int 0 0 0 1 0 0 0 0 0 2 ...
$ replyToSN : chr NA "person_pikin1" "GetLocoNow" NA ...
$ created : chr "2022-04-08 12:34:13" "2022-04-08 12:34:09" "2022-04-08 12:31:41" "2022-04-08 12:31:09" ...
$ truncated : logi FALSE FALSE TRUE FALSE TRUE FALSE ...
$ replyToSID : num NA 1.51e+18 NA NA 1.51e+18 ...
$ id : num 1.51e+18 1.51e+18 1.51e+18 1.51e+18 1.51e+18 ...
$ replyToUID : num NA 1.23e+18 9.40e+17 NA 5.42e+08 ...
$ statusSource : chr "<a href=\"https://www.rssground.com\" rel=\"nofollow\">RSS Ground</a>" "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>" "<a href=\"http://twitter.com/download/android\" rel=\"nofollow\">Twitter for Android</a>" "<a href=\"https://mobile.twitter.com\" rel=\"nofollow\">Twitter Web App</a>" ...
$ screenName : chr "ukdealsonline" "c_r_u_i_s_e_" "PrinceAlioth" "mmuhdnoahh" ...
$ retweetCount : int 2 0 0 0 0 0 0 0 0 0 ...
$ isRetweet : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ retweeted : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ longitude : num NA NA NA NA NA NA NA NA NA NA ...
$ latitude : num NA NA NA NA NA NA NA NA NA NA ...
While there were only 3846 tweets from users about the Samsung S22, there were 2.17 times more tweets about the Apple iPhone 13 during the period considered.
On 3 out of 9 days Samsung S22 had more tweets than iPhone 13. On the other days Apple exceeded the number of tweets extremely. Surprisingly, the number of tweets and users per day for the Samsung S22 remains constant while there are only a few tweets about the iPhone 13 in the first 3 days and a high volume of tweets from 3nd April until 7th April 2022. There was not only more tweets but also a larger group of users posting about iPhone 13. We could not identify a specific reason for this phenomenon because the last official Apple event was in March 2022 and not in the time period we tested.
There is no similar indicator from Google Trends or other sources. The search terms iPhone 13, Samsung S22, Apple and Samsung are relatively constant over time as you can see from the image below (Screenshot from Google Trends). It can be noted that Apple is more likely to be looked up than Samsung as well as the iPhone 13 is more popular on Google Search than Samsung S22. This gives us a first impression that the iPhone 13 model receives in general more attention.
From a brand perspective, Samsung has more searches overall than Apple; and from their latest models, iPhones have more searches than Samsung mobile phones. This result may be because Samsung has a more diverse product range and series Korea Expose than Apple Apple Wiki, but within the flagship phones of their brands, iPhone is getting more attention than Samsung S22. Apple has more topicality in the online discussions which might due to their marketing strategy or people just like to give attention and being involved in the discussion of their new product releasing.
There are more tweets than users which indicates that some users usually post more than one tweet. We calculated a ratio for this: avg_tweets_per_user = tweets per day / users per day.
It is interesting though there were less conversation about Apple in the first three days, there were statistically more people involved, so it seems to be a conversation between a smaller group of users. We see the same for Samsung S22 between 4th April to 6th April 2022.
We decided to preprocess the data based on the following steps:
All text was converted to lower case, e.g. Hello to hello.
All contractions were converted to the longer form, e.g. don’t to do not
All common internet slang was converted to formal English, e.g. TGIF to Thanks God it is Friday
Hashtags (#) were removed
Word elongation was replaced to the usual word form, e.g. heeeeey to hey
All non-ASCII characters were replaced with equivalent or removed, © to (C)
White space within the string is reduced to one white space
White space at the start and end of the string was removed
“RT”, implicating that its a retweet was removed
all links were removed based on the start of “http”
all @username were removed
punctuation were removed
stop words were removed based on the RegEx approach
wordcloud(word_s, min.freq = 60, max.words = 40, random.order = FALSE, color= pal)
Samsung S22 users mentioned Samsung, Galaxy (which is the series of the S22) and Ultra (which is a specific model). The Ultra model is comparable to the iPhone 13 Pro.
Also iPhone, Pro, Max, Note & OnePlus are mentioned which are other comparable smartphones in the market.
Other words that are often mentioned belong to specifications that users talked about:
android, update, security
camera, pixel, video
features
mediatek
screen
amp
case
wordcloud(word_a, min.freq = 60, max.words = 40, random.order = FALSE, color= pal)
We can identify that Apple iPhone is mentioned in the variants Pro, Mini and Max. Pro seems to be the most important one, then Max and then the Mini.
As with the Samsung S22, also here it is mentioned with “Ultra”, “Galaxy” and “Samsung”.
The specifications of the phone that users talked about were:
pixel, camera
battery, amp
price
case
green
For iPhone 13, we can notice that some adjectives and verbs were mentioned a lot:
available, buy, win, free, still, now, good, better, best, like, will, get, want, need, can
We will identify topics further during our analysis using LDA Topic Modelling.
Both models are mentioned with their camera and pixel but only Samsung has the word “video” mentioned a lot. We identified in our pre-research about the phones that both companies advertise their smartphones based on the camera. None of both mentions about positive abilities in the video recording in their main advertisement.
For Samsung S22 users it seems to be important that a new update has been released. A new Android OS update for the series S22 has been released in April 2022 and updated security measurements and brought new features.
For Apple iPhone 13 it has been talked a lot about green. The generation 13 has been the first Flagship model to be produced in this colour, therefore it makes sense that people discuss about it.
We need to preprocess data this time additionally with the following steps:
Emojis were replaced by the word form, e.g. Smiling emoji to smiling
Emoji Identifier were replaced by the word form, e.g. :-) to smiling
stopwords_regex <- paste(stopwords('en'), collapse = '\\b|\\b')
stopwords_regex <- paste0('\\b', stopwords_regex, '\\b')
Samsung_df <- samsung_df$text %>%
str_to_lower() %>% #all text to lower case
replace_contraction() %>% #replaces contractions to longer form
replace_internet_slang() %>% #replaces common internet slang
replace_hash(replacement = "") %>% #removes hashtags
replace_word_elongation() %>% #removes word elongation, e.g. "heeeeey" to "hey"
replace_emoji() %>% #replaces emojis with the word form
replace_emoji_identifier() %>% #replaces emoji identifiers to word form
replace_non_ascii() %>% #replaces common non-ASCII characters.
str_squish() %>% #reduces repeated whitespace inside a string
str_trim() %>% #removes whitespace from start and end of string
{gsub("(RT|via)((?:\\b\\W*@\\w+)+)","",.)} %>% #remove RT (retweets)
{gsub("http[^[:blank:]]+","",.)} %>% #remove links that start with http
{gsub("@\\u+","",.)} %>% #remove names
{gsub('@\\w+', '', .)} %>% # remove at people
{gsub("[[:punct:]]"," ",.)} %>%#remove punctuation
{gsub("[^[:alnum:]]"," ",.)}%>%#remove punctuation
{gsub("pro"," ",.)}%>%#removes the word "pro" because its a different context herein
stringr::str_replace_all(stopwords_regex, '') %>% #remove stop words
make_plural() %>%
unique()#remove duplicates
tail(Samsung_df)
[1] "samsung galaxy s22 s22 plus s22 ultra overheating issue find fix overheating issues s"
[2] " samsung s22 giveaway bro frowning faces"
[3] "greetings samsung galaxy s22 ultra s"
[4] " want samsung s22 ultras"
[5] " must 12 things new samsung galaxy s22 s22 s"
[6] "oneplus pad amp 10 launch honor magic x14 amp x15 samsung s22 ultra 1tb digital payments fire s"
[1] "hi friends confetti ball full coverage soft tpu case iphone 13 smiling cat face heart eyes smiling cat face heart eyes many colors hibiscus cherry blossom blossom 0 8 thumbs thumbs thumbs wha s"
[2] "first world dilemma get wait 8 fritz m looking s"
[3] " 1 issues downloading new s"
[4] "cute geometry art label transparent korean phone case iphone s"
[5] "happy adventurer jun087 galaxy s21 artscase s"
[6] " lock apps iphone passcode using shortcuts hindi s"
In the first step, we wanted to identify the general sentimental view for each of the two brands. Because we have different sizes in the data sets (Apple iPhone 13 database is larger than Samsung S22), we could not compare them in absolute numbers but had to compare relative values.
On the first look, sentiments are similar for both smartphone models and are commented on with highly positive sentiments. This means that on both models, users are generally speaking positively about their smartphone.
Samsung S22 has more trust and anticipation while iPhone 13 brings more joy to its users but also more sadness, surprise, fear and disgust.
###### Sentimentr score #######
sentimentr_apple <- sentiment_by(Apple_df, by=NULL)
# You can see the sentiment per tweet ID:
ggplot(data=sentimentr_apple,aes(x=element_id,y=ave_sentiment, color=ave_sentiment))+
geom_line()
# You can see the summary of minimum, IQR, median and mean for all variables. For us, word_count and ave_sentiment are mostly interesting:
summary(sentimentr_apple)
element_id word_count sd ave_sentiment
Min. : 1 Min. : 1.0 Min. : NA Min. :-1.63096
1st Qu.:1829 1st Qu.: 7.0 1st Qu.: NA 1st Qu.: 0.00000
Median :3658 Median : 10.0 Median : NA Median : 0.04121
Mean :3658 Mean : 10.3 Mean :NaN Mean : 0.11135
3rd Qu.:5486 3rd Qu.: 13.0 3rd Qu.: NA 3rd Qu.: 0.26726
Max. :7314 Max. :135.0 Max. : NA Max. : 1.69378
NA's :7314
# You can see the variance and standard deviation for the Sentiment Score & Word Count:
data.frame(" "= c("Average", "Variance", "Standard Deviation"),
"Sentiment Score" = c(round(mean(sentimentr_apple$ave_sentiment),2), round(var(sentimentr_apple$ave_sentiment),2),
round(sd(sentimentr_apple$ave_sentiment),2)),
"Word Count"=c(round(mean(sentimentr_apple$word_count),2), round(var(sentimentr_apple$word_count),2),
round(sd(sentimentr_apple$word_count),2)))
##### Sentimentr score ######
sentimentr_samsung <- sentiment_by(Samsung_df, by=NULL)
# You can see the sentiment per tweet ID:
ggplot(data=sentimentr_samsung,aes(x=element_id,y=ave_sentiment, color=ave_sentiment))+
geom_line()
# You can see the summary of minimum, IQR, median and mean for all variables. For us, word_count and ave_sentiment are mostly interesting:
summary(sentimentr_samsung)
element_id word_count sd ave_sentiment
Min. : 1 Min. : 1.00 Min. : NA Min. :-1.06905
1st Qu.: 766 1st Qu.: 9.00 1st Qu.: NA 1st Qu.: 0.00000
Median :1531 Median :11.00 Median : NA Median : 0.06682
Mean :1531 Mean :11.24 Mean :NaN Mean : 0.11337
3rd Qu.:2296 3rd Qu.:14.00 3rd Qu.: NA 3rd Qu.: 0.25298
Max. :3061 Max. :35.00 Max. : NA Max. : 1.59250
NA's :3061
# You can see the variance and standard deviation for the Sentiment Score & Word Count:
data.frame(" "= c("Average", "Variance", "Standard Deviation"),
"Sentiment Score" = c(round(mean(sentimentr_samsung$ave_sentiment),2), round(var(sentimentr_samsung$ave_sentiment),2),
round(sd(sentimentr_samsung$ave_sentiment),2)),
"Word Count"=c(round(mean(sentimentr_samsung$word_count),2), round(var(sentimentr_samsung$word_count),2),
round(sd(sentimentr_samsung$word_count),2)))
Samsung S22 tweets were on average 10.4 words long while Apple iPhone 13 tweets had 9.9 words. The tweet with the maximum number of words was about iPhone 13 with 135 identified words. Samsung S22 with 34 words maximum words is quite shorter.
The sentimentr package in R estimates the sentiment polarity by sentence. The average sentiment for iPhone 13 was higher distributed with a range from -1.6 to +1.9 with its average at 0.26 while Samsung had a sentiment polarity from -1.1 to +1.6 with its average at 0.15. The higher variance within the data is identical to our findings from the plot “Relative Sentiment Score based on Tweets about Apple and Samsung” earlier. We can see from this that iPhone 13 users are in general more emotional - in a negative and positive direction - than Samsung S22 users. We assume that people are emotionally more dependent on their iPhone 13 than Samsung S22.
To see the sentiments per tweet, generate this HTML files:
Selecting by weightage
The word scandal is related to the GOS app that slowed Samsung phones down on purpose to save battery life without notifying the user about it, affecting the performance of over 10 thousand apps. This scandal seems to have the most negative impact on the sentiments of Samsung users.
Selecting by weightage
Positive words for Apple are seperated more evenly excluding the first one, while for Samsung the weightage of all the other positive words are relatively small other than the first two. Both the highest are “new” which indicates that people are discussing a lot to the newest model. More positive words for Apple are seemingly not that related to how people feel about the newsest model specifically but very general expression, while Samsung has more practical comments.
Negative words for Apple are mainly non-directional and more like a emotional vent which give little information about why people feel the negative way we detected about Apple, while we can clearly see Samsung is more or less affected by the scandal since there’re many negative words around it.
Warning: `distinct_()` was deprecated in dplyr 0.7.0.
Please use `distinct()` instead.
See vignette('programming') for more help
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated.
Warning: ggrepel: 1 unlabeled data points (too many overlaps). Consider increasing max.overlaps
tdm_s <- corpus<- Corpus(VectorSource(word_s))
tdm_s <- TermDocumentMatrix(tdm_s)
tdm_s2 <- removeSparseTerms(tdm_s, sparse = 0.95)
ms2 <- as.matrix(tdm_s2)
# cluster terms
distMatrix_s <- dist(scale(ms2))
fit_s <- hclust(distMatrix_s, method = "ward.D")
plot(fit_s)
rect.hclust(fit_s, k = 9,border = "red") # cut tree into 6 clusters
tdm_a <- corpus<- Corpus(VectorSource(word_a))
tdm_a <- TermDocumentMatrix(tdm_a)
tdm_a2 <- removeSparseTerms(tdm_a, sparse = 0.95)
ma2 <- as.matrix(tdm_a2)
# cluster terms
distMatrix_a <- dist(scale(ma2))
fit_a <- hclust(distMatrix_a, method = "ward.D")
plot(fit_a)
rect.hclust(fit_a, k = 9,border = "red") # cut tree into 6 clusters
$CTM
Topic 1 Topic 2 Topic 3
[1,] "new" "phone" "seri"
[2,] "phone" "get" "case"
[3,] "seri" "camera" "use"
[4,] "can" "now" "featur"
[5,] "updat" "plus" "camera"
[6,] "one" "seri" "android"
[7,] "best" "case" "video"
[8,] "smartphon" "just" "best"
[9,] "get" "will" "will"
[10,] "just" "android" "buy"
$VEM
Topic 1 Topic 2 Topic 3
[1,] "phone" "phone" "new"
[2,] "seri" "camera" "seri"
[3,] "new" "case" "phone"
[4,] "case" "smartphon" "get"
[5,] "can" "plus" "camera"
[6,] "just" "featur" "android"
[7,] "note" "use" "one"
[8,] "updat" "cover" "best"
[9,] "featur" "see" "use"
[10,] "one" "april" "now"
$VEM_Fixed
Topic 1 Topic 2 Topic 3
[1,] "phone" "phone" "new"
[2,] "seri" "camera" "seri"
[3,] "new" "case" "phone"
[4,] "case" "smartphon" "get"
[5,] "can" "plus" "camera"
[6,] "just" "featur" "android"
[7,] "note" "use" "one"
[8,] "updat" "cover" "best"
[9,] "featur" "april" "use"
[10,] "one" "see" "now"
$Gibbs
Topic 1 Topic 2 Topic 3
[1,] "buy" "phone" "amp"
[2,] "note" "seri" "launch"
[3,] "know" "new" "batteri"
[4,] "call" "camera" "photo"
[5,] "month" "case" "charg"
[6,] "fold" "get" "ever"
[7,] "also" "use" "wait"
[8,] "pre" "best" "experi"
[9,] "ram" "updat" "work"
[10,] "chipset" "plus" "week"
CTM.1 VEM.1 VEM_Fixed.1 Gibbs.1
3 3 3 2
The topics about “seri”, “now”, “camera”, “update” relates to the S22 Ultra’s April patch that introduced many camera-related features.
$CTM
Topic 1 Topic 2 Topic 3
[1,] "case" "case" "phone"
[2,] "phone" "use" "can"
[3,] "get" "one" "green"
[4,] "new" "want" "new"
[5,] "like" "get" "got"
[6,] "mini" "buy" "know"
[7,] "price" "plus" "just"
[8,] "just" "just" "camera"
[9,] "good" "new" "mini"
[10,] "use" "phone" "buy"
$VEM
Topic 1 Topic 2 Topic 3
[1,] "phone" "new" "phone"
[2,] "get" "case" "just"
[3,] "case" "get" "camera"
[4,] "green" "now" "want"
[5,] "can" "use" "case"
[6,] "got" "want" "one"
[7,] "new" "give" "buy"
[8,] "buy" "will" "need"
[9,] "one" "upgrad" "mini"
[10,] "mini" "amp" "can"
$VEM_Fixed
Topic 1 Topic 2 Topic 3
[1,] "phone" "new" "phone"
[2,] "get" "case" "just"
[3,] "case" "get" "camera"
[4,] "green" "now" "want"
[5,] "can" "use" "case"
[6,] "got" "want" "one"
[7,] "new" "give" "buy"
[8,] "buy" "will" "need"
[9,] "one" "upgrad" "mini"
[10,] "mini" "amp" "can"
$Gibbs
Topic 1 Topic 2 Topic 3
[1,] "phone" "switch" "compar"
[2,] "case" "amaz" "news"
[3,] "get" "face" "run"
[4,] "new" "caus" "now"
[5,] "just" "hold" "box"
[6,] "use" "shockproof" "bgmi"
[7,] "can" "volum" "believ"
[8,] "buy" "night" "expens"
[9,] "want" "guess" "sim"
[10,] "green" "luxuri" "whatsapp"
CTM.1 VEM.1 VEM_Fixed.1 Gibbs.1
3 1 1 1
For further analysis, we have decided to analyse the Amazon product reviews for both the phones in order to understand if the sentiments of customers on Twitter and Amazon were matching or not.
To do this, the first step was to collect the data. We faced a similar situation for collecting the data, Samsung S22 had very few reviews and the iPhone 13 had a large amount of reviews. So, to collect more data, we used Amazon reviews from different country websites like UK, India, USA and Australia. Specifically, English reviews were selected to proceed with the analysis.
We have to admit that there is only one review section for all Samsung models in the variants of the series S22. The same is valid for all variants of iPhone 13 series.
The following R code was implemented to collect data from different Amazon websites. (<https://martinctc.github.io/blog/vignette-scraping-amazon-reviews-in-r/>)
# #Scraping Product Reviews from Amazon
#
# #import libraries
# library(rvest)
# library(stringr)
# library(xml2)
# library(tidyverse)
#
#
# scrape_amazon <- function(ASIN, page_num){
#
# #amazon_uk
# #url_reviews <- paste0("https://www.amazon.co.uk/product-reviews/",ASIN,"/?pageNumber=",page_num)
# #amazon.com
# #url_reviews <- paste0("https://www.amazon.com/product-reviews/",ASIN,"/?pageNumber=",page_num)
# #amazon.in
# #url_reviews <- paste0("https://www.amazon.in/product-reviews/",ASIN,"/?pageNumber=",page_num)
#
# #amazon.com.au
# url_reviews <- paste0("https://www.amazon.com.au/product-reviews/",ASIN,"/?pageNumber=",page_num)
#
# doc <- read_html(url_reviews) # Assign results to `doc`
#
# # Review Title
# doc %>%
# html_nodes("[class='a-size-base a-link-normal review-title a-color-base review-title-content a-text-bold']") %>%
# html_text() -> review_title1
#
# review_title <- str_squish(review_title1)
#
# # Review Text
# doc %>%
# html_nodes("[class='a-size-base review-text review-text-content']") %>%
# html_text() -> review_text1
#
# review_text <- str_squish(review_text1)
#
# # Number of stars in review
# doc %>%
# html_nodes("[data-hook='review-star-rating']") %>%
# html_text() -> review_star1
#
# review_star <- str_squish(review_star1)
#
# # Date
#
# doc %>%
# html_nodes("[data-hook='review-date']") %>%
# html_text() -> review_date1
# review_date <- str_squish(review_date1)
#
# # Return a tibble
# tibble(review_title,
# review_text,
# review_star,
# review_date,
# page = page_num) %>% return()
# }
#
#
# ASIN <- "B09MW17JQY" # Specify ASIN
# page_range <- 1 # Let's say we want to scrape pages 1 to 10
#
# match_key <- tibble(n = page_range,
# key = sample(page_range,length(page_range)))
#
# lapply(page_range, function(i){
# j <- match_key[match_key$n==i,]$key
#
# message("Getting page ",i, " of ",length(page_range), "; Actual: page ",j) # Progress bar
#
# Sys.sleep(3) # Take a three second break
#
# if((i %% 3) == 0){ # After every three scrapes... take another two second break
#
# message("Taking a break...") # Prints a 'taking a break' message on your console
#
# Sys.sleep(2) # Take an additional two second break
# }
# scrape_amazon(ASIN = ASIN, page_num = j) # Scrape
# }) -> output_list
#
#
# write.csv(output_list,"C:\\Users\\Shruthi\\OneDrive\\Documents\\Semester 2\\Social Media Analytics\\Group Assignment\\Samsung_Australia.csv",row.names = FALSE)
[1] 110 5
review_title review_text review_star review_date page
Length:110 Length:110 Length:110 Length:110 Min. : 1.000
Class :character Class :character Class :character Class :character 1st Qu.: 2.000
Mode :character Mode :character Mode :character Mode :character Median : 5.000
Mean : 5.091
3rd Qu.: 8.000
Max. :10.000
[1] 507 5
review_title review_text review_star review_date page
Length:507 Length:507 Length:507 Length:507 Min. : 1.00
Class :character Class :character Class :character Class :character 1st Qu.: 6.00
Mode :character Mode :character Mode :character Mode :character Median :15.00
Mean :17.09
3rd Qu.:28.00
Max. :40.00
samsung.reviews.text <- samsung.reviews$review_text %>%
str_to_lower() %>% #all text to lower case
replace_contraction() %>% #replaces contractions to longer form
replace_internet_slang() %>% #replaces common internet slang
replace_hash(replacement = "") %>% #removes hashtags
replace_word_elongation() %>% #removes word elongation, e.g. "heeeeey" to "hey"
str_squish() %>% #reduces repeated whitespace inside a string
str_trim() %>% #removes whitespace from start and end of string
{gsub("(RT|via)((?:\\b\\W*@\\w+)+)","",.)} %>% #remove RT (retweets)
{gsub("http[^[:blank:]]+","",.)} %>% #remove links that start with http
{gsub("@\\u+","",.)} %>% #remove names
{gsub('@\\w+', '', .)} %>% # remove at people
{gsub("[[:punct:]]"," ",.)} %>%#remove punctuation
{gsub("[^[:alnum:]]"," ",.)}%>%#remove punctuation
str_replace_all("issue", "issues") %>%
removeNumbers() %>%
removeWords(stopwords("english")) %>%
make_plural() %>%
stringr::str_replace_all(stopwords_regex, '') %>% #remove stop words
unique() #remove duplicates
samsung.reviews.text <- data.frame(samsung.reviews.text)
Joining, by = "word"
Joining, by = "word"
iphone.reviews.text <- iphone.reviews$review_text %>%
str_to_lower() %>% #all text to lower case
replace_contraction() %>% #replaces contractions to longer form
replace_internet_slang() %>% #replaces common internet slang
replace_hash(replacement = "") %>% #removes hashtags
replace_word_elongation() %>% #removes word elongation, e.g. "heeeeey" to "hey"
str_squish() %>% #reduces repeated whitespace inside a string
str_trim() %>% #removes whitespace from start and end of string
{gsub("(RT|via)((?:\\b\\W*@\\w+)+)","",.)} %>% #remove RT (retweets)
{gsub("http[^[:blank:]]+","",.)} %>% #remove links that start with http
{gsub("@\\u+","",.)} %>% #remove names
{gsub('@\\w+', '', .)} %>% # remove at people
{gsub("[[:punct:]]"," ",.)} %>%#remove punctuation
{gsub("[^[:alnum:]]"," ",.)}%>%#remove punctuation
removeNumbers() %>%
removeWords(stopwords("english")) %>%
str_replace_all("issue", "issues") %>%
make_plural() %>%
stringr::str_replace_all(stopwords_regex, '') %>% #remove stop words
unique() #remove duplicates
iphone.reviews.text <- data.frame(iphone.reviews.text)
Joining, by = "word"
Joining, by = "word"
Both products have in common that people review on the camera of the phone mostly. For Samsung S22 this is followed by the display and screen. For Apple iPhone 13 this is followed by the battery. The screen is also mentioned at the iPhone 13 reviews while the battery life is mostly not mentioned for Samsung S22.
For Apple words like “amazing” and “nice” are frequently mentioned while Samsung has “issues”. On the other hand, Samsung S22 gets attention for the performance while iPhone 13 for its prices.
Joining, by = "word"
Joining, by = "word"
Joining, by = "word"
Joining, by = "word"
Joining, by = "word"
Joining, by = "word"
# Correlation Terms
# The correlation of appearing together in a review
samsung.correlation.terms <- samsung.reviews.text %>%
mutate(review = row_number()) %>%
unnest_tokens(word, samsung.reviews.text) %>%
filter(!word %in% stop_words$word) %>%
group_by(word) %>%
filter(n() >=7)%>%
pairwise_cor(word, review, sort = TRUE)
samsung.correlation.terms
library(ggraph)
library(igraph)
samsung.correlation.terms %>%
filter(correlation >= 0.55) %>%
graph_from_data_frame() %>%
ggraph(layout = "igraph", algorithm = "kk") +
geom_edge_link(aes(alpha = correlation),
show.legend = FALSE)+
geom_node_point(color = "lightblue", size = 2) +
geom_node_text(aes(label = name), repel = TRUE) +
theme_void()+
ggtitle("Correlation of terms in Samsung S22 Reviews")
Snapdragon feature means the Qualcomm Snapdragon 8 Gen 1 is the processor that the S22 uses in the US models. Other countries in the world receive an Samsung S22 models which uses Samsung’s own processor Qualcomm Snapdragon 8 Gen 1.
Fingerprint Reader is Samsung’s feature to unlock the phone. In contrast to the iPhone it still has a fingerprint reader instead of the Face ID but it doesn’t have a scanner as a button but an ultrasonic screen that identifies the fingerprint.
Software issues app could be related to the scandal of Game Optimising Service (GOS) app that slows down more than 10 thousand apps.
Battery life of Samsung S22
# Correlation Terms
# The correlation of appearing together in a review
apple.correlation.terms <- iphone.reviews.text %>%
mutate(review = row_number()) %>%
unnest_tokens(word, iphone.reviews.text) %>%
filter(!word %in% stop_words$word) %>%
group_by(word) %>%
filter(n() >= 5)%>%
pairwise_cor(word, review, sort = TRUE)
apple.correlation.terms
library(ggraph)
library(igraph)
apple.correlation.terms %>%
filter(correlation >= 0.55) %>%
graph_from_data_frame() %>%
ggraph(layout = "igraph", algorithm = "kk") +
geom_edge_link(aes(alpha = correlation),
show.legend = FALSE)+
geom_node_point(color = "lightblue", size = 2) +
geom_node_text(aes(label = name), repel = TRUE) +
theme_void()+
ggtitle("Correlation of terms in Apple iPhone 13 Reviews")
# now we find the centrality measures of the network
# degree:the number of its adjacent edges (measure of direct influence)
deg_s <- degree(bigrams.network_s, mode = "all")
#K-core decomposition allows us to identify the core and the periphery of the network. A k-core is a maximal subnet of a network such that all nodes have at least degree K.
core_s <- coreness(bigrams.network_s, mode = "all")
# betweenness measures brokerage or gatekeeping potential. It is (approximately) the number of shortest paths between nodes that pass through a particular node.
betw_s <- betweenness(bigrams.network_s)
#Eigenvector centrality is a measure of being well-connected connected to the well-connected. First eigenvector of the graph adjacency matrix. Only works with undirected networks.
eigen_s <- eigen_centrality(bigrams.network_s, directed = TRUE)
members_s <- cluster_walktrap(bigrams.network_s)
library(igraph)
bigrams.network_s <- simplify(bigrams.network_s
#remove.multiple = FALSE, #error occured ?
#remove.loops = TRUE)
)
V(bigrams.network_s)$color <- members_s$membership+1
# Using "Coreness" as size
# Coreness -> mean (average distance to all the other nodes, diffusion of information)
plot(bigrams.network_s,
layout = layout_with_fr,
vertex.label.color = "black",
vertex.label.cex = 0.9,
vertex.label.dist = 0,
vertex.frame.color = 0,
vertex.size = core_s*10,
edge.arrow.size = 0.01,
edge.curved = 0.7,
edge.color = "gray",
main = "Bigram Communities (Samsung)"
)
mtext("Coreness")
# Other sizes we tried but Coreness worked best for our models
# Using "Degree" as size
# degree=mode (number of edges of the node, in-degree:prestige
#
# plot(bigrams.network_s,
# layout = layout_with_fr,
# vertex.label.color = "black",
# vertex.label.cex = 0.6,
# vertex.label.dist = 0,
# vertex.frame.color = 0,
# vertex.size = deg_s,
# edge.arrow.size = 0.1,
# edge.curved = 1,
# edge.color = "gray",
# main = "Bigram Communities (Samsung)"
# )
# mtext("Degree")
#
# # Using "Eigenvector Centrality" as size
# # centrality (the most connected words)
# plot(bigrams.network_s,
# layout = layout_with_fr,
# vertex.label.color = "black",
# vertex.label.cex = 0.6,
# vertex.label.dist = 0,
# vertex.size = eigen_s$vector*20,
# edge.arrow.size = 0.1,
# edge.curved = 1,
# edge.color = "gray",
# main = "Bigram Communities (Samsung)"
# )
# mtext("Eigenvector Centrality")
#
# # Using "Betweenness" as size
# #Betweenness -> median (weighted # of paths going through the node)
# plot(bigrams.network_s,
# layout = layout_with_fr,
# vertex.label.color = "black",
# vertex.label.cex = 0.6,
# vertex.label.dist = 0,
# vertex.size = betw_s,
# edge.arrow.size = 0.1,
# edge.curved = 1,
# edge.color = "gray",
# main = "Bigram Communities (Samsung)"
# )
# mtext("Betweenness")
# now we find the centrality measures of the network
# degree:the number of its adjacent edges (measure of direct influence)
deg_a <- degree(bigrams.network_a, mode = "all")
#K-core decomposition allows us to identify the core and the periphery of the network. A k-core is a maximal subnet of a network such that all nodes have at least degree K.
core_a <- coreness(bigrams.network_a, mode = "all")
# betweenness measures brokerage or gatekeeping potential. It is (approximately) the number of shortest paths between nodes that pass through a particular node.
betw_a <- betweenness(bigrams.network_a)
#Eigenvector centrality is a measure of being well-connected connected to the well-connected. First eigenvector of the graph adjacency matrix. Only works with undirected networks.
eigen_a <- eigen_centrality(bigrams.network_a, directed = TRUE)
members_a <- cluster_walktrap(bigrams.network_a)
library(igraph)
bigrams.network_a <- simplify(bigrams.network_a
#remove.multiple = FALSE, #error occured ?
#remove.loops = TRUE)
)
V(bigrams.network_a)$color <- members_a$membership+1
# Using "Coreness" as size
# Coreness -> mean (average distance to all the other nodes, diffusion of information)
plot(bigrams.network_a,
layout = layout_with_fr,
vertex.label.color = "black",
vertex.label.cex = 0.6,
vertex.label.dist = 0,
vertex.frame.color = 0,
vertex.size = core_a*10,
edge.arrow.size = 0.1,
edge.curved = 1,
edge.color = "gray",
main = "Bigram Communities (iPhone 13)"
)
mtext("Coreness")
# Other sizes we tried but Coreness worked best for our models
# # Using "Degree" as size
# # degree=mode (number of edges of the node, in-degree:prestige
#
# plot(bigrams.network_a,
# layout = layout_with_fr,
# vertex.label.color = "black",
# vertex.label.cex = 0.6,
# vertex.label.dist = 0,
# vertex.frame.color = 0,
# vertex.size = deg_a,
# edge.arrow.size = 0.1,
# edge.curved = 1,
# edge.color = "gray",
# main = "Bigram Communities (iPhone 13)"
# )
# mtext("Degree")
#
# # Using "Eigenvector Centrality" as size
# # centrality (the most connected words)
# plot(bigrams.network_a,
# layout = layout_with_fr,
# vertex.label.color = "black",
# vertex.label.cex = 0.6,
# vertex.label.dist = 0,
# vertex.size = eigen_a$vector*20,
# edge.arrow.size = 0.1,
# edge.curved = 1,
# edge.color = "gray",
# main = "Bigram Communities (iPhone 13)"
# )
# mtext("Eigenvector Centrality")
#
# # Using "Betweenness" as size
# #Betweenness -> median (weighted # of paths going through the node)
# plot(bigrams.network_a,
# layout = layout_with_fr,
# vertex.label.color = "black",
# vertex.label.cex = 0.6,
# vertex.label.dist = 0,
# vertex.size = betw_a,
# edge.arrow.size = 0.1,
# edge.curved = 1,
# edge.color = "grey",
# main = "Bigram Communities (iPhone 13)"
# )
# mtext("Betweenness")